Question: “How can I use the sf package to handle spatial data?”

Objectives:

  • Show how to create spatial buffers and centroids
  • Demonstrate how to intersect vector data
  • Present the function to retrieve the area of polygons

Packages needed

library(dplyr)
library(sf)
library(tibble)
library(ggplot2)
library(remotes)
remotes::install_github('ropensci/osmdata')
library(osmdata)
library(osmextract)

Introduction

What is sf?

sf is a package which supports simple features (sf), “a standardized way to encode spatial vector data.”. It contains a large set of functions to achieve all the operations on vector spatial data for which you might use traditional GIS software: change the coordinate system, join layers, intersect or unit polygons, create buffers and centroids, etc. cf. the sf cheatsheet.

We are going to go through some of these basics with the case study of Rotterdam buildings.

Conservation in Rotterdam

Let’s focus on really old building and imagine we’re in charge of their conservation. We want to know how much of the city would be affected by a non-construction zone of 500m around pre-1800 buildings.

Let’s select them and see where they are.

assign("has_internet_via_proxy", TRUE, environment(curl::has_internet))

bb <- getbb('Rotterdam', format_out = 'sf_polygon')
x <- opq(bbox = bb) %>% 
   add_osm_feature(key = 'building') %>%
    osmdata_sf ()
buildings <- x$osm_polygons %>% st_transform(.,crs=28992)

old <- 1800 # year prior to which you consider a building old

summary(buildings$start_date)
   Length     Class      Mode 
   253669 character character 
buildings$start_date <- as.numeric(as.character(buildings$start_date))

old_buildings <- buildings |>
  filter(start_date <= old)

 ggplot(data = old_buildings) + geom_sf(colour="red")

Basic GIS operations

As conservationists, we want to create a zone around historical buildings where building regulation will have special restrictions to preserve historical buildings.

Buffer

Let’s say this zone should be 500 meters. In GIS terms, we want to create a buffer around polygons. The corresponding function sf is st_buffer, with 2 arguments: the polygons around which to create buffers, and the radius of the buffer.

 distance <- 500 # in meters 
 
buffer_old_buildings <- 
  st_buffer(x = old_buildings, dist = distance)
 
 ggplot(data = buffer_old_buildings) + geom_sf()

Union

Now, we have a lot of overlapping buffers. We would rather create a unique conservation zone rather than overlapping ones in that case. So we have to fuse the overlapping buffers into one polygon. This operation is called union and the corresponding function is st_union.

 single_old_buffer <- st_union(buffer_old_buildings) |> 
   st_cast(to = "POLYGON") |> 
  st_as_sf() 

single_old_buffer<- single_old_buffer |>
  add_column("ID"=as.factor(1:nrow(single_old_buffer)))  |> 
  st_transform(.,crs=28992) 

We also use st_cast() to explicit the type of the resulting object (POLYGON instead of the default MULTIPOLYGON), st_as_sf() to transform the polygon into an sf object.

Centroids

For the sake of visualisation speed, we would like to represent buildings by a single point (for instance: their geometric centre) rather than their actual footprint. This operation means defining their centroid and the corresponding function is st_centroid.

sf::sf_use_s2(FALSE)
centroids_old <- st_centroid(old_buildings) |> 
  st_transform(.,crs=28992)  

ggplot() + 
    geom_sf(data = single_old_buffer, aes(fill=ID)) +
    geom_sf(data = centroids_old)

Intersection

Now what we would like to distinguish conservation areas based on the number of historic buildings they contain. In GIS terms, we would like to know how many centroids each fused buffer polygon contains. This operation means intersecting the layer of polygons with the layer of points and the corresponding function is st_intersection.

 centroids_buffers <- st_intersection(centroids_old,single_old_buffer) |> 
   add_column(n=1)
 centroid_by_buffer <- centroids_buffers |> 
   group_by(ID) |>
   summarise(n = sum(n))
 single_buffer <- single_old_buffer |> 
   add_column(n_buildings = centroid_by_buffer$n)

  ggplot() + 
   geom_sf(data = single_buffer, aes(fill=n_buildings)) +
   scale_fill_viridis_c(alpha = 0.8,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B")

Final output:

Let’s map this layer over the initial map of individual buildings.

ggplot() + 
   geom_sf(data = buildings) +
   geom_sf(data = single_buffer, aes(fill=n_buildings), colour = NA) +
   scale_fill_viridis_c(alpha = 0.6,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B") 

Exercise

Challenge: Conservation rules have changed. The historical threshold now applies to all pre-war buildings, but the distance to these building is reduced to 100m. Can you map the number of all buildings per 100m fused buffer?


Solution

 old <- 1939 
 distance <- 100
 #select
 old_buildings <- buildings |>
   filter(start_date <= old)
 #buffer
 buffer_old_buildings <- st_buffer(old_buildings, dist = distance)
  #union
 single_old_buffer <- st_union(buffer_old_buildings) |> 
   st_cast(to = "POLYGON") |> 
   st_as_sf()  
 
 single_old_buffer <- single_old_buffer |>
   add_column("ID"=1:nrow(single_old_buffer))  |> 
   st_transform(single_old_buffer,crs=4326) 
 #centroids
 centroids_old <- st_centroid(old_buildings) |> st_transform(.,crs=4326)  
  #intersection
  centroids_buffers <- st_intersection(centroids_old,single_old_buffer) |> 
   add_column(n=1)
 centroid_by_buffer <- centroids_buffers |> 
   group_by(ID) |>
   summarise(
   n = sum(n)
   )
 single_buffer <- single_old_buffer |> 
   add_column(n_buildings = centroid_by_buffer$n)
  ggplot() + 
   geom_sf(data = buildings) +
   geom_sf(data = single_buffer, aes(fill=n_buildings), colour = NA) +
   scale_fill_viridis_c(alpha = 0.6,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B") 

Problem: there are many pre-war buildings and the buffers are large so the number of old buildings is not very meaningful. Let’s compute the density of old buildings per buffer zone.

Area

single_buffer$area <- st_area(single_buffer, )  %>% units::set_units(., km^2)

single_buffer$old_buildings_per_km2 <- as.numeric(single_buffer$n_buildings / single_buffer$area)

 ggplot() + 
   geom_sf(data = buildings) +
   geom_sf(data = single_buffer, aes(fill=old_buildings_per_km2), colour = NA) +
   scale_fill_viridis_c(alpha = 0.6,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B") 


Summary and Keypoints:

We have seen how to create spatial buffers and centroids, how to intersect vector data and how retrieve the area of polygons.

In short: